Text Alignment in the Real World: Improving Alignments of Noisy Translations Using Common Lexical Features, String Matching Strategies and N-Gram Comparisons

نویسندگان

  • Mark W. Davis
  • Ted Dunning
  • William C. Ogden
چکیده

Alignment methods based on byte-length comparisons of alignment blocks have been remarkably successful for aligning good translations from legislative transcriptions. For noisy translations in which the parallel text of a document has significant structural differences, byte-alignment methods often do not perform well. The Pan American Health Organization (PAHO) corpus is a series of articles that were first translated by machine methods and then improved by professional translators. Many of the Spanish PAHO texts do not share formatting conventions with the corresponding English documents, refer to tables in stylistically different ways and contain extraneous information. A method based on a dynamic programming framework, but using a decision criterion derived from a combination of byte-length ratio measures, hard matching of numbers, string comparisons and n-gram co-occurrence matching substantially improves the performance of the alignment process.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating the Social Practice of Persian Translations of ‘The Girl You Left Behind’ through Translators’ Lexical and Grammatical Strategies

The present study aimed to shed light upon the differences of social practice of Persian translations of The Girl You Left Behind written by Jojo Moyes (2012) with original text in English based on Fairclough's (1995) model. In this regard, through a careful analysis of the source and target texts, English social prac- tice instances were selected along with their Persian equivalents as the cor...

متن کامل

Equivalency and Non-equivalency of Lexical Items in English Translations of Nahj al-balagha

Lexical items play a key role in both language in general and translation in particular. Likewise, equivalence is a controversial concept discussed so widely in translation studies. Some theorists deem it to be fundamental in translation theory and define translation in terms of equivalence. The aim of this study is to identify the problems of lexical gaps in two translations of Nahj al-ba...

متن کامل

Free Resources And Advanced Alignment For Cross-Language Text Retrieval

For the Cross-Language Text Retrieval Track in TREC 6, NMSU experimented with a new approach to deriving translation equivalents from parallel text databases, and also investigated performing automatic, dictionary-based translation of query terms by using a dictionary that could be queried remotely via the World Wide Web. The new approach to building bilingual translation lexicons involved alig...

متن کامل

Combining String and Context Similarity for Bilingual Term Alignment from Comparable Corpora

Automatically compiling bilingual dictionaries of technical terms from comparable corpora is a challenging problem, yet with many potential applications. In this paper, we exploit two independent observations about term translations: (a) terms are often formed by corresponding sub-lexical units across languages and (b) a term and its translation tend to appear in similar lexical context. Based ...

متن کامل

BLEUÂTRE: Flattening Syntactic Dependencies for MT Evaluation

This paper describes a novel approach to syntactically-informed evaluation of machine translation (MT). Using a statistical, treebanktrained parser, we extract word-word dependencies from reference translations and then compile these dependencies into a representation that allows candidate translations to be evaluated by string comparisons, as is done in n-gram approaches to MT evaluation. This...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995